Parsing a Probabilistic Dependency Grammar
نویسنده
چکیده
This note discusses the current development of an approach to parsing designed to overcome some of the problems of existing parsers, particularly with respect to their utility as probabilistic language models. The parser combines lexical and grammatical constraints into a uniform grammatical representation and is readily trainable (since the parser output is indistinguishable from the grammar input). A key issue is what path of generalization to use for rare and unseen constructions. The problem with parsers A parser is a device that provides a description of the syntactic phrases that make up a sentence. For a speech understanding task, the parser has two roles. First, it should provide a description of the phrases in a sentence so these phrases can be interpreted by a subsequent semantic processor. The second function is to provide a language model a model of the likelihood of a sentence to constrain the speech recognition task. It is unfortunately the case that existing parsers developed for text fulfill neither of these roles very well, for several reasons. The Lexicality Problem Typically, parsers have provided no way to express the constraints among individual words (but that is changing cf. Schahes 1988,1992; Bod 1992. Yet it is clear that much of our knowledge of language has to do with what words go together (Church et al. 1991). Merely knowing the grammatical rules of the language is not enough to predict which words can go together. (This fact accounts for the relative efficacy of n-gram language models compared with say context free grammars.) For example, general English grammatical rules admit premodification of a noun by another noun or by an adjective. It is possible to describe broad semantic constraints on such modification; so for example, early morning is a case of a time-adjective modifying a timeperiod, and morning flight is a time-period modifying _flight(s) early_ N morning 143 33 597 af-le~oon146 25 evening 504 51 12 215 night 27 0 121 Table 1: Preand postmodifiers for time nominals in a 266k word ATIS sample. an event. Already we are have an explosion of categories in the grammar, since we are talking not about nouns and adjectives, but about a fairly detailed subclassification of semantic types of nouns and adjectives. But even this degree of quasi-grammatical detail is insufficient, as Table 1 shows: the adjective-noun combination early night does not occur. This dependency of syntactic combinability on particular lexical items is repeatedly observed across the grammar and lexicon. The lexicality problem has two aspects. One is representing the information and the other is acquiring it. There has recently been increasing work on both aspects of the problem. The approach described in this paper is but one of many possible approaches, designed with an emphasis on facilitating efficient parsing.
منابع مشابه
The Effect of Dependency Representation Scheme on Syntactic Language Modelling
There has been considerable work on syntactic language models and they have advanced greatly over the last decade. Most of them have used a probabilistic contextfree grammar (PCFG) or a dependency grammar (DG). In particular, DG has attracted more and more interest in the past years since dependency parsing has achieved great success. While much work has evaluated the effects of different depen...
متن کاملProbabilistic Parsing Action Models for Multi-Lingual Dependency Parsing
Deterministic dependency parsers use parsing actions to construct dependencies. These parsers do not compute the probability of the whole dependency tree. They only determine parsing actions stepwisely by a trained classifier. To globally model parsing actions of all steps that are taken on the input sentence, we propose two kinds of probabilistic parsing action models that can compute the prob...
متن کاملCombine Constituent and Dependency Parsing via Reranking
This paper presents a reranking approach to combining constituent and dependency parsing, aimed at improving parsing performance on both sides. Most previous combination methods rely on complicated joint decoding to integrate graphand transition-based dependency models. Instead, our approach makes use of a high-performance probabilistic context free grammar (PCFG) model to output k-best candida...
متن کاملUnsupervised Bayesian Parameter Estimation for Dependency Parsing
We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilitsic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal prior as a prior over the grammar parameters. We derive a variational EM algorithm for that model...
متن کاملBreaking the barrier of context-freeness
This paper presents a generative probabilistic dependency model of parallel texts that can be used for statistical machine translation and parallel parsing. Unlike syntactic models that are based on context-free dependency grammars, the dependency model proposed in this paper is based on a sophisticated notion of dependency grammar that is capable of modelling non-projective word order and isla...
متن کاملParsing with Lexicalized Probabilistic Recursive Transition Networks
We present a formalization of lexicalized Recursive Transition Networks which we call Automaton-Based Generative Dependency Grammar (gdg). We show how to extract a gdg from a syntactically annotated corpus, present a chart parser for gdg, and discuss different probabilistic models which are directly implemented in the finite automata and do not affect the parser.
متن کامل